Overview

Dataset statistics

Number of variables16
Number of observations16554
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 MiB
Average record size in memory104.0 B

Variable types

Numeric9
Categorical7

Alerts

grade is highly correlated with bathrooms and 4 other fieldsHigh correlation
bathrooms is highly correlated with grade and 2 other fieldsHigh correlation
bedrooms is highly correlated with grade and 2 other fieldsHigh correlation
sqft_living15 is highly correlated with grade and 2 other fieldsHigh correlation
floors is highly correlated with antiguedad_ventaHigh correlation
sqft_lot is highly correlated with sqft_lot15High correlation
price is highly correlated with grade and 1 other fieldsHigh correlation
sqft_lot15 is highly correlated with zipcode and 2 other fieldsHigh correlation
sqft_living is highly correlated with grade and 4 other fieldsHigh correlation
antiguedad_venta is highly correlated with zipcode and 4 other fieldsHigh correlation
view is highly correlated with waterfrontHigh correlation
waterfront is highly correlated with viewHigh correlation
zipcode is highly correlated with sqft_lot15 and 1 other fieldsHigh correlation
condition is highly correlated with antiguedad_ventaHigh correlation
df_index has unique values Unique
antiguedad_venta has 320 (1.9%) zeros Zeros

Reproduction

Analysis started2022-10-02 21:55:15.270504
Analysis finished2022-10-02 21:55:32.535428
Duration17.26 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct16554
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18811.53262
Minimum1
Maximum113866
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.5 KiB
2022-10-02T16:55:32.689433image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1130.65
Q16188.5
median14556.5
Q327262.5
95-th percentile51274.25
Maximum113866
Range113865
Interquartile range (IQR)21074

Descriptive statistics

Standard deviation16147.10472
Coefficient of variation (CV)0.8583619978
Kurtosis1.628791243
Mean18811.53262
Median Absolute Deviation (MAD)9673.5
Skewness1.265459165
Sum311406111
Variance260728990.9
MonotonicityNot monotonic
2022-10-02T16:55:32.864251image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
198571
 
< 0.1%
418611
 
< 0.1%
275911
 
< 0.1%
244021
 
< 0.1%
56791
 
< 0.1%
515231
 
< 0.1%
232411
 
< 0.1%
61841
 
< 0.1%
222331
 
< 0.1%
67651
 
< 0.1%
Other values (16544)16544
99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
51
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
121
< 0.1%
141
< 0.1%
151
< 0.1%
ValueCountFrequency (%)
1138661
< 0.1%
1119061
< 0.1%
1095711
< 0.1%
1083111
< 0.1%
992971
< 0.1%
981951
< 0.1%
980151
< 0.1%
947681
< 0.1%
940831
< 0.1%
937391
< 0.1%

zipcode
Real number (ℝ≥0)

HIGH CORRELATION

Distinct70
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean98079.01027
Minimum98001
Maximum98199
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size64.8 KiB
2022-10-02T16:55:33.036262image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum98001
5-th percentile98004
Q198033
median98070
Q398118
95-th percentile98177
Maximum98199
Range198
Interquartile range (IQR)85

Descriptive statistics

Standard deviation53.58031141
Coefficient of variation (CV)0.0005462974317
Kurtosis-0.88492992
Mean98079.01027
Median Absolute Deviation (MAD)43
Skewness0.3725473414
Sum1623599936
Variance2870.849771
MonotonicityNot monotonic
2022-10-02T16:55:33.204275image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
98103477
 
2.9%
98115476
 
2.9%
98052467
 
2.8%
98034446
 
2.7%
98117442
 
2.7%
98038431
 
2.6%
98042431
 
2.6%
98118406
 
2.5%
98133402
 
2.4%
98023399
 
2.4%
Other values (60)12177
73.6%
ValueCountFrequency (%)
98001291
1.8%
98002160
1.0%
98003213
1.3%
98004205
1.2%
98005132
 
0.8%
98006377
2.3%
98007113
 
0.7%
98008226
1.4%
9801069
 
0.4%
98011150
 
0.9%
ValueCountFrequency (%)
98199234
1.4%
98198220
1.3%
98188108
 
0.7%
98178211
1.3%
98177192
1.2%
98168218
1.3%
98166200
1.2%
98155372
2.2%
9814851
 
0.3%
98146228
1.4%

grade
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.583967621
Minimum1
Maximum13
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size64.8 KiB
2022-10-02T16:55:33.368289image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile6
Q17
median7
Q38
95-th percentile10
Maximum13
Range12
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.103965081
Coefficient of variation (CV)0.1455656374
Kurtosis1.371933156
Mean7.583967621
Median Absolute Deviation (MAD)1
Skewness0.7165229669
Sum125545
Variance1.2187389
MonotonicityNot monotonic
2022-10-02T16:55:33.500301image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
77145
43.2%
84770
28.8%
91881
 
11.4%
61592
 
9.6%
10698
 
4.2%
11211
 
1.3%
5183
 
1.1%
1240
 
0.2%
423
 
0.1%
137
 
< 0.1%
Other values (2)4
 
< 0.1%
ValueCountFrequency (%)
11
 
< 0.1%
33
 
< 0.1%
423
 
0.1%
5183
 
1.1%
61592
 
9.6%
77145
43.2%
84770
28.8%
91881
 
11.4%
10698
 
4.2%
11211
 
1.3%
ValueCountFrequency (%)
137
 
< 0.1%
1240
 
0.2%
11211
 
1.3%
10698
 
4.2%
91881
 
11.4%
84770
28.8%
77145
43.2%
61592
 
9.6%
5183
 
1.1%
423
 
0.1%

view
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size937.8 KiB
0
15123 
2
 
666
3
 
324
1
 
251
4
 
190

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters16554
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
015123
91.4%
2666
 
4.0%
3324
 
2.0%
1251
 
1.5%
4190
 
1.1%

Length

2022-10-02T16:55:33.632217image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-02T16:55:33.781993image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
015123
91.4%
2666
 
4.0%
3324
 
2.0%
1251
 
1.5%
4190
 
1.1%

Most occurring characters

ValueCountFrequency (%)
015123
91.4%
2666
 
4.0%
3324
 
2.0%
1251
 
1.5%
4190
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16554
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
015123
91.4%
2666
 
4.0%
3324
 
2.0%
1251
 
1.5%
4190
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common16554
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
015123
91.4%
2666
 
4.0%
3324
 
2.0%
1251
 
1.5%
4190
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII16554
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
015123
91.4%
2666
 
4.0%
3324
 
2.0%
1251
 
1.5%
4190
 
1.1%

bathrooms
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size970.1 KiB
2.0
8224 
1.0
6635 
3.0
1504 
4.0
 
191

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49662
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row1.0
3rd row2.0
4th row1.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.08224
49.7%
1.06635
40.1%
3.01504
 
9.1%
4.0191
 
1.2%

Length

2022-10-02T16:55:33.895655image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-02T16:55:34.034666image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2.08224
49.7%
1.06635
40.1%
3.01504
 
9.1%
4.0191
 
1.2%

Most occurring characters

ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
28224
16.6%
16635
13.4%
31504
 
3.0%
4191
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33108
66.7%
Other Punctuation16554
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
016554
50.0%
28224
24.8%
16635
20.0%
31504
 
4.5%
4191
 
0.6%
Other Punctuation
ValueCountFrequency (%)
.16554
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49662
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
28224
16.6%
16635
13.4%
31504
 
3.0%
4191
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII49662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
28224
16.6%
16635
13.4%
31504
 
3.0%
4191
 
0.4%

bedrooms
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size970.1 KiB
3.0
7846 
4.0
5208 
2.0
2163 
5.0
1176 
1.0
 
161

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49662
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.0
2nd row3.0
3rd row4.0
4th row5.0
5th row3.0

Common Values

ValueCountFrequency (%)
3.07846
47.4%
4.05208
31.5%
2.02163
 
13.1%
5.01176
 
7.1%
1.0161
 
1.0%

Length

2022-10-02T16:55:34.173419image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-02T16:55:34.343436image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
3.07846
47.4%
4.05208
31.5%
2.02163
 
13.1%
5.01176
 
7.1%
1.0161
 
1.0%

Most occurring characters

ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
37846
15.8%
45208
 
10.5%
22163
 
4.4%
51176
 
2.4%
1161
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33108
66.7%
Other Punctuation16554
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
016554
50.0%
37846
23.7%
45208
 
15.7%
22163
 
6.5%
51176
 
3.6%
1161
 
0.5%
Other Punctuation
ValueCountFrequency (%)
.16554
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49662
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
37846
15.8%
45208
 
10.5%
22163
 
4.4%
51176
 
2.4%
1161
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII49662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
37846
15.8%
45208
 
10.5%
22163
 
4.4%
51176
 
2.4%
1161
 
0.3%

sqft_living15
Real number (ℝ≥0)

HIGH CORRELATION

Distinct679
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1944.15084
Minimum460
Maximum5790
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.5 KiB
2022-10-02T16:55:34.494444image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum460
5-th percentile1130
Q11470
median1810
Q32300
95-th percentile3190
Maximum5790
Range5330
Interquartile range (IQR)830

Descriptive statistics

Standard deviation649.2374266
Coefficient of variation (CV)0.3339439581
Kurtosis1.455088632
Mean1944.15084
Median Absolute Deviation (MAD)390
Skewness1.059620425
Sum32183473
Variance421509.2361
MonotonicityNot monotonic
2022-10-02T16:55:34.667461image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1440160
 
1.0%
1560159
 
1.0%
1540155
 
0.9%
1500150
 
0.9%
1460144
 
0.9%
1720138
 
0.8%
1580137
 
0.8%
1480134
 
0.8%
1610134
 
0.8%
1520134
 
0.8%
Other values (669)15109
91.3%
ValueCountFrequency (%)
4601
 
< 0.1%
6202
 
< 0.1%
6701
 
< 0.1%
6902
 
< 0.1%
7002
 
< 0.1%
7101
 
< 0.1%
7202
 
< 0.1%
7405
< 0.1%
7501
 
< 0.1%
7602
 
< 0.1%
ValueCountFrequency (%)
57905
< 0.1%
56001
 
< 0.1%
53801
 
< 0.1%
53301
 
< 0.1%
52201
 
< 0.1%
50801
 
< 0.1%
50701
 
< 0.1%
49501
 
< 0.1%
49301
 
< 0.1%
49131
 
< 0.1%

waterfront
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size937.8 KiB
0
16459 
1
 
95

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters16554
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
016459
99.4%
195
 
0.6%

Length

2022-10-02T16:55:34.825470image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-02T16:55:34.965485image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
016459
99.4%
195
 
0.6%

Most occurring characters

ValueCountFrequency (%)
016459
99.4%
195
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16554
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
016459
99.4%
195
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common16554
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
016459
99.4%
195
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII16554
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
016459
99.4%
195
 
0.6%

floors
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size970.1 KiB
1.0
9811 
2.0
6254 
3.0
 
489

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters49662
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row1.0
3rd row2.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.09811
59.3%
2.06254
37.8%
3.0489
 
3.0%

Length

2022-10-02T16:55:35.092744image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-02T16:55:35.245756image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1.09811
59.3%
2.06254
37.8%
3.0489
 
3.0%

Most occurring characters

ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
19811
19.8%
26254
 
12.6%
3489
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number33108
66.7%
Other Punctuation16554
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
016554
50.0%
19811
29.6%
26254
 
18.9%
3489
 
1.5%
Other Punctuation
ValueCountFrequency (%)
.16554
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common49662
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
19811
19.8%
26254
 
12.6%
3489
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII49662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.16554
33.3%
016554
33.3%
19811
19.8%
26254
 
12.6%
3489
 
1.0%

sqft_lot
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7871
Distinct (%)47.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9934.879667
Minimum520
Maximum137214
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.5 KiB
2022-10-02T16:55:35.400769image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum520
5-th percentile1715.65
Q15000
median7480
Q310140
95-th percentile29944
Maximum137214
Range136694
Interquartile range (IQR)5140

Descriptive statistics

Standard deviation10957.36316
Coefficient of variation (CV)1.102918559
Kurtosis31.24093454
Mean9934.879667
Median Absolute Deviation (MAD)2507.5
Skewness4.698176897
Sum164461998
Variance120063807.5
MonotonicityNot monotonic
2022-10-02T16:55:35.574780image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5000289
 
1.7%
6000221
 
1.3%
4000197
 
1.2%
7200174
 
1.1%
480097
 
0.6%
750095
 
0.6%
450093
 
0.6%
960091
 
0.5%
840088
 
0.5%
360079
 
0.5%
Other values (7861)15130
91.4%
ValueCountFrequency (%)
5201
< 0.1%
6001
< 0.1%
6091
< 0.1%
6351
< 0.1%
6381
< 0.1%
6492
< 0.1%
6511
< 0.1%
6761
< 0.1%
6811
< 0.1%
6831
< 0.1%
ValueCountFrequency (%)
1372141
< 0.1%
1369151
< 0.1%
1367781
< 0.1%
1362901
< 0.1%
1306801
< 0.1%
1300171
< 0.1%
1276311
< 0.1%
1254521
< 0.1%
1220381
< 0.1%
1206611
< 0.1%

price
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3237
Distinct (%)19.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean510603.4272
Minimum75000
Maximum7700000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.5 KiB
2022-10-02T16:55:35.774799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum75000
5-th percentile210000
Q1316000
median440000
Q3620000
95-th percentile963000
Maximum7700000
Range7625000
Interquartile range (IQR)304000

Descriptive statistics

Standard deviation323805.8088
Coefficient of variation (CV)0.634163015
Kurtosis50.59509917
Mean510603.4272
Median Absolute Deviation (MAD)141000
Skewness4.658301218
Sum8452529134
Variance1.048502018 × 1011
MonotonicityNot monotonic
2022-10-02T16:55:35.953808image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
350000137
 
0.8%
450000133
 
0.8%
425000128
 
0.8%
550000126
 
0.8%
500000123
 
0.7%
375000116
 
0.7%
325000114
 
0.7%
300000109
 
0.7%
400000108
 
0.7%
250000105
 
0.6%
Other values (3227)15355
92.8%
ValueCountFrequency (%)
750001
< 0.1%
780001
< 0.1%
800001
< 0.1%
810001
< 0.1%
820001
< 0.1%
825001
< 0.1%
830001
< 0.1%
840001
< 0.1%
850002
< 0.1%
890001
< 0.1%
ValueCountFrequency (%)
77000001
< 0.1%
70625001
< 0.1%
55700001
< 0.1%
53000001
< 0.1%
51108001
< 0.1%
45000001
< 0.1%
40000001
< 0.1%
38500001
< 0.1%
38000002
< 0.1%
37100001
< 0.1%

condition
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size937.8 KiB
3
10763 
4
4357 
5
1285 
2
 
126
1
 
23

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters16554
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
310763
65.0%
44357
26.3%
51285
 
7.8%
2126
 
0.8%
123
 
0.1%

Length

2022-10-02T16:55:36.129823image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-02T16:55:36.297837image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
310763
65.0%
44357
26.3%
51285
 
7.8%
2126
 
0.8%
123
 
0.1%

Most occurring characters

ValueCountFrequency (%)
310763
65.0%
44357
26.3%
51285
 
7.8%
2126
 
0.8%
123
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16554
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
310763
65.0%
44357
26.3%
51285
 
7.8%
2126
 
0.8%
123
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common16554
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
310763
65.0%
44357
26.3%
51285
 
7.8%
2126
 
0.8%
123
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII16554
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
310763
65.0%
44357
26.3%
51285
 
7.8%
2126
 
0.8%
123
 
0.1%

sqft_lot15
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7013
Distinct (%)42.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8996.247856
Minimum659
Maximum57140
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.5 KiB
2022-10-02T16:55:36.462850image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum659
5-th percentile1921.95
Q15011.25
median7500
Q39750
95-th percentile23045.85
Maximum57140
Range56481
Interquartile range (IQR)4738.75

Descriptive statistics

Standard deviation7636.814694
Coefficient of variation (CV)0.848888872
Kurtosis11.84745816
Mean8996.247856
Median Absolute Deviation (MAD)2400
Skewness3.168057171
Sum148923887
Variance58320938.67
MonotonicityNot monotonic
2022-10-02T16:55:36.633668image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5000339
 
2.0%
4000285
 
1.7%
6000230
 
1.4%
7200171
 
1.0%
7500111
 
0.7%
4800108
 
0.7%
450094
 
0.6%
840090
 
0.5%
360089
 
0.5%
408082
 
0.5%
Other values (7003)14955
90.3%
ValueCountFrequency (%)
6591
 
< 0.1%
6601
 
< 0.1%
7481
 
< 0.1%
7503
< 0.1%
7551
 
< 0.1%
7581
 
< 0.1%
7941
 
< 0.1%
8102
< 0.1%
8863
< 0.1%
8871
 
< 0.1%
ValueCountFrequency (%)
571401
 
< 0.1%
570632
 
< 0.1%
570001
 
< 0.1%
568271
 
< 0.1%
566286
< 0.1%
565681
 
< 0.1%
561922
 
< 0.1%
556571
 
< 0.1%
553221
 
< 0.1%
550231
 
< 0.1%

sqft_living
Real number (ℝ≥0)

HIGH CORRELATION

Distinct871
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.595143
Minimum290
Maximum12050
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size129.5 KiB
2022-10-02T16:55:36.811683image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum290
5-th percentile920
Q11400
median1880
Q32478.75
95-th percentile3560
Maximum12050
Range11760
Interquartile range (IQR)1078.75

Descriptive statistics

Standard deviation848.7053985
Coefficient of variation (CV)0.4208605785
Kurtosis4.451354919
Mean2016.595143
Median Absolute Deviation (MAD)520
Skewness1.323768322
Sum33382716
Variance720300.8534
MonotonicityNot monotonic
2022-10-02T16:55:36.978007image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1440111
 
0.7%
1400110
 
0.7%
1300106
 
0.6%
1540102
 
0.6%
1480102
 
0.6%
101099
 
0.6%
166099
 
0.6%
120099
 
0.6%
190098
 
0.6%
156098
 
0.6%
Other values (861)15530
93.8%
ValueCountFrequency (%)
2901
< 0.1%
3801
< 0.1%
3901
< 0.1%
4202
< 0.1%
4301
< 0.1%
4401
< 0.1%
4702
< 0.1%
4802
< 0.1%
4901
< 0.1%
5001
< 0.1%
ValueCountFrequency (%)
120501
< 0.1%
100401
< 0.1%
92001
< 0.1%
80201
< 0.1%
80101
< 0.1%
77101
< 0.1%
76201
< 0.1%
74801
< 0.1%
73901
< 0.1%
73501
< 0.1%

fue_renovada
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size937.8 KiB
0
15905 
1
 
649

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters16554
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
015905
96.1%
1649
 
3.9%

Length

2022-10-02T16:55:37.141019image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-02T16:55:37.296028image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
015905
96.1%
1649
 
3.9%

Most occurring characters

ValueCountFrequency (%)
015905
96.1%
1649
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number16554
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
015905
96.1%
1649
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
Common16554
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
015905
96.1%
1649
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII16554
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
015905
96.1%
1649
 
3.9%

antiguedad_venta
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct117
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.64552374
Minimum-1
Maximum115
Zeros320
Zeros (%)1.9%
Negative12
Negative (%)0.1%
Memory size129.5 KiB
2022-10-02T16:55:37.443041image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile4
Q118
median40
Q363
95-th percentile99
Maximum115
Range116
Interquartile range (IQR)45

Descriptive statistics

Standard deviation29.34580409
Coefficient of variation (CV)0.6723668678
Kurtosis-0.6696125664
Mean43.64552374
Median Absolute Deviation (MAD)23
Skewness0.4480073194
Sum722508
Variance861.1762178
MonotonicityNot monotonic
2022-10-02T16:55:37.627053image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9355
 
2.1%
11344
 
2.1%
8338
 
2.0%
10330
 
2.0%
0320
 
1.9%
37314
 
1.9%
7300
 
1.8%
36293
 
1.8%
46277
 
1.7%
47274
 
1.7%
Other values (107)13409
81.0%
ValueCountFrequency (%)
-112
 
0.1%
0320
1.9%
1220
1.3%
2133
 
0.8%
3120
 
0.7%
4102
 
0.6%
5150
0.9%
6250
1.5%
7300
1.8%
8338
2.0%
ValueCountFrequency (%)
11521
 
0.1%
11447
0.3%
11323
 
0.1%
11227
 
0.2%
11139
0.2%
11042
0.3%
10951
0.3%
10864
0.4%
10771
0.4%
10652
0.3%

Interactions

2022-10-02T16:55:30.407795image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:17.619153image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:19.266261image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:21.077740image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:22.550328image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:24.077289image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:25.847154image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:27.403082image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:28.896158image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:30.567643image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:17.798169image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:19.431273image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:21.234750image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:22.716340image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:24.244300image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:26.018164image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:27.560057image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:29.070783image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:30.723655image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:17.965467image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:19.609289image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:21.415768image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:22.901353image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:24.619931image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:26.206182image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:27.736064image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:29.253797image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:30.862667image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:18.121486image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:19.761295image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:21.572778image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:23.067212image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:24.785949image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:26.370192image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:27.896078image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:29.415808image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:30.996673image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:18.416500image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:19.929307image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:21.735432image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:23.233224image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:24.940955image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:26.544207image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:28.069091image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:29.576822image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:31.142684image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:18.577513image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:20.179327image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:21.895274image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:23.404240image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:25.136968image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:26.722218image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:28.243108image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:29.755836image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:31.297700image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:18.765527image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:20.428314image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:22.066291image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:23.570250image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:25.327983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:26.900229image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:28.413120image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:29.930851image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:31.631725image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:18.941233image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:20.657330image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:22.224300image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:23.729262image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:25.502993image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:27.072055image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:28.567130image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:30.083767image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:31.767732image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:19.110246image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:20.868722image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:22.379313image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:23.891279image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:25.675142image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:27.235070image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:28.727140image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-10-02T16:55:30.248780image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-10-02T16:55:37.802144image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-02T16:55:38.383337image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-02T16:55:38.667638image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-02T16:55:38.910656image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-02T16:55:39.102675image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-02T16:55:31.985749image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-02T16:55:32.350580image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexzipcodegradeviewbathroomsbedroomssqft_living15waterfrontfloorssqft_lotpriceconditionsqft_lot15sqft_livingfue_renovadaantiguedad_venta
019857980061002.03.03140.002.08481.0810000.0310008.02610.0022.0
11401498033811.03.02210.001.08955.0685000.038976.02210.0041.0
23290998005802.04.02230.002.018295.0725000.0319856.02650.0028.0
31630598001701.05.01660.001.08720.0274000.038030.01950.0053.0
4664798011702.03.01620.001.06449.0445000.037429.01630.0029.0
5586598040802.04.02550.001.08760.0762500.0410376.02610.0036.0
6800998004811.03.02630.001.014133.0979000.0417376.01700.0060.0
7473198011803.05.02640.002.04369.0540000.034610.02870.007.0
83848098052902.04.02730.002.08810.0690000.035100.02700.0010.0
91324698072701.03.01260.001.09673.0375000.039681.01660.0038.0

Last rows

df_indexzipcodegradeviewbathroomsbedroomssqft_living15waterfrontfloorssqft_lotpriceconditionsqft_lot15sqft_livingfue_renovadaantiguedad_venta
16544930298148701.02.01890.001.06000.0246500.028547.0940.0061.0
165452687298008601.03.01210.001.08000.0475000.047875.01270.0055.0
165464955898198621.02.01380.001.08925.0175000.037440.01170.00103.0
1654714698117601.02.0980.001.02130.0400000.042800.0980.0096.0
16548939698065702.03.02190.002.07263.0409000.035900.01950.007.0
165491446698198702.04.01630.002.06000.0175000.036000.01780.0023.0
165503005698042601.03.0920.001.05525.0191000.055330.0840.0046.0
16551582498106702.03.01780.001.06771.0310000.036771.01780.0024.0
165521671298038702.03.01060.002.03011.0230000.033232.01340.0019.0
16553237980751002.03.02970.002.07857.0800000.037857.03240.0020.0